134 research outputs found

    Sparse integrative clustering of multiple omics data sets

    Get PDF
    High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91-108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek "experimental" points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS578 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    FACETS: Allele-Specific Copy Number and Clonal Heterogeneity Analysis Tool Estimates for High-Throughput DNA Sequencing

    Get PDF
    Allele-specific copy number analysis (ASCN) from next generation sequenc- ing (NGS) data can greatly extend the utility of NGS beyond the iden- tification of mutations to precisely annotate the genome for the detection of homozygous/heterozygous deletions, copy-neutral loss-of-heterozygosity (LOH), allele-specific gains/amplifications. In addition, as targeted gene panels are increasingly used in clinical sequencing studies for the detection of “actionable” mutations and copy number alterations to guide treatment decisions, accurate, tumor purity-, ploidy-, and clonal heterogeneity-adjusted integer copy number calls are greatly needed to more reliably interpret NGS- based cancer gene copy number data in the context of clinical sequencing. We developed FACETS, an ASCN tool and open-source software with a broad application to whole genome, whole-exome, as well as targeted panel sequencing platforms. It is a fully integrated stand-alone pipeline that in- cludes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and inte- grated visualization. We demonstrate the application of FACETS using the Cancer Genome Atlas (TCGA) whole-exome sequencing of lung adenocarci- noma samples. We also demonstrate its application to a clinical sequencing platform based on a targeted gene panel

    Statistical Methods in Cancer Genomics.

    Full text link
    Genomic and proteomic experiments have become widely applied in cancer profiling studies over the past decade. The genomics era is marked by the success of using DNA microarrays to delineate genome-scale gene expression patterns to pinpoint disease mechanism at the molecular level. An increasing number of studies have profiled tumor specimens using distinct microarray platforms and analysis techniques. With the accumulating amount of microarray data, integrative analysis has the potential to identify common gene expression patterns across data sets and tissue types. In this proposal, I introduce a Bayesian mixture model-based approach for meta-analysis of microarray studies. A probabilistic measure of gene differential expression is used as a scaleless quantity for an integrative analysis of DNA microarray data sets across platforms and laboratories. The role of DNA microarrays has been primarily on the discovery side to screen through thousands of genes for potential disease biomarkers. In this respect, Tissue Microarrays (TMAs) have provided a proteomic platform for downstream validation studies of these target discoveries. The other part of this proposal concerns an implementation of measurement error models for patient survival outcome analysis using TMA expression data. Two goals are explored: 1) in a two-stage approach, a Latent Expression Index (LEI) is introduced as a summary index for the TMA repeated expression measures; 2) a joint model of survival and TMA expression data is established via a shared random effect. Bayesian estimation is carried out using a Markov Chain Monte Carlo (MCMC) method. As an extension to the measurement error models, I further propose a Cell Mixture model to allow a wider range of inferences for TMA expression data.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57619/2/rlshen_1.pd

    Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data

    Get PDF
    BACKGROUND: An increasing number of studies have profiled tumor specimens using distinct microarray platforms and analysis techniques. With the accumulating amount of microarray data, one of the most intriguing yet challenging tasks is to develop robust statistical models to integrate the findings. RESULTS: By applying a two-stage Bayesian mixture modeling strategy, we were able to assimilate and analyze four independent microarray studies to derive an inter-study validated "meta-signature" associated with breast cancer prognosis. Combining multiple studies (n = 305 samples) on a common probability scale, we developed a 90-gene meta-signature, which strongly associated with survival in breast cancer patients. Given the set of independent studies using different microarray platforms which included spotted cDNAs, Affymetrix GeneChip, and inkjet oligonucleotides, the individually identified classifiers yielded gene sets predictive of survival in each study cohort. The study-specific gene signatures, however, had minimal overlap with each other, and performed poorly in pairwise cross-validation. The meta-signature, on the other hand, accommodated such heterogeneity and achieved comparable or better prognostic performance when compared with the individual signatures. Further by comparing to a global standardization method, the mixture model based data transformation demonstrated superior properties for data integration and provided solid basis for building classifiers at the second stage. Functional annotation revealed that genes involved in cell cycle and signal transduction activities were over-represented in the meta-signature. CONCLUSION: The mixture modeling approach unifies disparate gene expression data on a common probability scale allowing for robust, inter-study validated prognostic signatures to be obtained. With the emerging utility of microarrays for cancer prognosis, it will be important to establish paradigms to meta-analyze disparate gene expression data for prognostic signatures of potential clinical use

    Pathway analysis reveals functional convergence of gene expression profiles in breast cancer

    Get PDF
    Abstract Background A recent study has shown high concordance of several breast-cancer gene signatures in predicting disease recurrence despite minimal overlap of the gene lists. It raises the question if there are common themes underlying such prediction concordance that are not apparent on the individual gene-level. We therefore studied the similarity of these gene-signatures on the basis of their functional annotations. Results We found the signatures did not identify the same set of genes but converged on the activation of a similar set of oncogenic and clinically-relevant pathways. A clear and consistent pattern across the four breast cancer signatures is the activation of the estrogen-signaling pathway. Other common features include BRCA1-regulated pathway, reck pathways, and insulin signaling associated with the ER-positive disease signatures, all providing possible explanations for the prediction concordance. Conclusion This work explains why independent breast cancer signatures that appear to perform equally well at predicting patient prognosis show minimal overlap in gene membership.</p

    Variance prior specification for a basket trial design using Bayesian hierarchical modeling

    Get PDF
    Background: In the era of targeted therapies, clinical trials in oncology are rapidly evolving, wherein patients from multiple diseases are now enrolled and treated according to their genomic mutation(s). In such trials, known as basket trials, the different disease cohorts form the different baskets for inference. Several approaches have been proposed in the literature to efficiently use information from all baskets while simultaneously screening to find individual baskets where the drug works. Most proposed methods are developed in a Bayesian paradigm that requires specifying a prior distribution for a variance parameter, which controls the degree to which information is shared across baskets. Methods: A common method used to capture the correlated endpoints across baskets is Bayesian hierarchical modeling. We evaluate a Bayesian adaptive design in the context of a basket trial and investigate two popular prior specifications: an inverse-gamma prior on the basket-level variance and a uniform prior on the basket-level standard deviation. Results: From our simulation study, we see the inverse-gamma prior is highly sensitive to the input hyperparameters. When the prior mean value of the variance parameter is set to be near zero (\u3c0.5), this can lead to unacceptably high false positive rates (\u3e40%) in some scenarios. Thus, use of this prior requires a fully comprehensive sensitivity analysis before implementation. Alternatively, we see that a prior that moves the mass of the variance parameter away from zero, such as the uniform prior, displays desirable and robust operating characteristics over a wide range of prior specifications, with the caveat that the upper bound of the uniform prior must be larger than 1. Conclusion: Based on our results, we recommend that those involved in designing basket trials that implement hierarchical modeling avoid using a prior distribution that places a large density mass near zero for the variance parameter. Priors with this property force the model to share information regardless of the true efficacy configuration of the baskets. Many commonly used inverse-gamma prior specifications have this undesirable property. We recommend to instead consider the more robust uniform prior on the standard deviation

    Modeling intra-tumor protein expression heterogeneity in tissue microarray experiments

    Full text link
    Tissue microarrays (TMAs) measure tumor-specific protein expression via high-density immunohistochemical staining assays. They provide a proteomic platform for validating cancer biomarkers emerging from large-scale DNA microarray studies. Repeated observations within each tumor result in substantial biological and experimental variability. This variability is usually ignored when associating the TMA expression data with patient survival outcome. It generates biased estimates of hazard ratio in proportional hazards models. We propose a Latent Expression Index (LEI) as a surrogate protein expression estimate in a two-stage analysis. Several estimators of LEI are compared: an empirical Bayes, a full Bayes, and a varying replicate number estimator. In addition, we jointly model survival and TMA expression data via a shared random effects model. Bayesian estimation is carried out using a Markov chain Monte Carlo method. Simulation studies were conducted to compare the two-stage methods and the joint analysis in estimating the Cox regression coefficient. We show that the two-stage methods reduce bias relative to the naive approach, but still lead to under-estimated hazard ratios. The joint model consistently outperforms the two-stage methods in terms of both bias and coverage property in various simulation scenarios. In case studies using prostate cancer TMA data sets, the two-stage methods yield a good approximation in one data set whereas an insufficient one in the other. A general advice is to use the joint model inference whenever results differ between the two-stage methods and the joint analysis. Copyright © 2008 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/58565/1/3217_ftp.pd

    A Latent Variable Approach for Meta-Analysis of Gene Expression Data from Multiple Microarray Experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the explosion in data generated using microarray technology by different investigators working on similar experiments, it is of interest to combine results across multiple studies.</p> <p>Results</p> <p>In this article, we describe a general probabilistic framework for combining high-throughput genomic data from several related microarray experiments using mixture models. A key feature of the model is the use of latent variables that represent quantities that can be combined across diverse platforms. We consider two methods for estimation of an index termed the probability of expression (POE). The first, reported in previous work by the authors, involves Markov Chain Monte Carlo (MCMC) techniques. The second method is a faster algorithm based on the expectation-maximization (EM) algorithm. The methods are illustrated with application to a meta-analysis of datasets for metastatic cancer.</p> <p>Conclusion</p> <p>The statistical methods described in the paper are available as an R package, metaArray 1.8.1, which is at Bioconductor, whose URL is <url>http://www.bioconductor.org/</url>.</p

    Sex-specific survival and tumor mutational burden in early stage melanoma

    Get PDF
    Introduction Tumor mutational burden (TMB) is a promising biomarker of clinical response to immune checkpoint inhibitors in metastatic cancers, and melanoma-specific survival. There are also significant gender-specific differences in TMB with men having consistently higher TMB than women. This relationship is provocative given the well-documented female melanoma survival advantage, and has not been investigated in early-stage primary tumors naïve to treatment. Approach Here we present preliminary findings on sex, survival, and tumor mutational burden from Stages II and III primary melanoma tumors, none of which have received immunotherapy using the MSK IMPACT™ next generation sequencing assay. Our team evaluated survival in 581 primary melanoma tumors procured by the parent P01 grant; 251 from patients who died with melanoma within five years (median survival, 2.4 years), and 330 from individuals who have lived at least five years (median follow up 8.5 years). Preliminary Results In the full dataset, we found the expected female survival advantage (log rank test P=0.049). After controlling for multiple comparisons using maximally selected ranked statistics7 the protective effect of high TMB on survival disappeared (HR=0.43, 95% CI=0.19 to 0.97, P=0.037). When stratified by sex, high TMB was associated with significantly improved melanoma specific survival among men (p=0.024), but not women (P=0.9). Broader Impacts Our study is the first to investigate the relationship between sex, tumor mutational burden, and mortality in an early stage primary cohort that has not received immunotherapy. In our small sample, we observed the expected protective effect of TMB on survival, but no evidence of gender differences in TMB or survival, despite the robust, consistent, and well-documented female survival advantage 5,6. Our results are an important first step to increasing our understanding of the relationship between mutational burden, survival, and biological sex. Limitations These results are exploratory and have not been adjusted for potential confounding factors such as stage, Breslow score, gender, or age
    corecore